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Abstract. This paper reviews the role of expert judgement to support 
reliability assessments within the systems engineering design process. 
Generic design processes are described to give the context and a dis- 
cussion is given about the nature of the reliability assessments required 
in the different systems engineering phases. It is argued that, as far 
as meeting reliability requirements is concerned, the whole design pro- 
cess is more akin to a statistical control process than to a straight- 
forward statistical problem of assessing an unknown distribution. This 
leads to features of the expert judgement problem in the design context 
which are substantially different from those seen, for example, in risk 
assessment. In particular, the role of experts in problem structuring 
and in developing failure mitigation options is much more prominent, 
and there is a need to take into account the reliability potential for 
future mitigation measures downstream in the system life cycle. An 
overview is given of the stakeholders typically involved in large scale 
systems engineering design projects, and this is used to argue the need 
for methods that expose potential judgemental biases in order to gen- 
erate analyses that can be said to provide rational consensus about 
uncertainties. Finally, a number of key points are developed with the 
aim of moving toward a framework that provides a holistic method for 
tracking reliability assessment through the design process. 
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1. INTRODUCTION 

Statistics is considered one of the major contrib- 
utors to the development of reliability engineering 
as a technical discipline [131]. Recent reviews of the 
role of statistics within reliability engineering [12, 
82, 92, 102] underline the continued need for sta- 
tistical science to help engineers assess sources of 
uncertainty, design sound data collection systems, 
and develop models for combining data and quanti- 
fying uncertainty. However it is also recognized that 
the role of statistical science within the engineer- 
ing process needs to broaden to accommodate the 
additional complexities of the technological systems 
as well as the operational contexts. One particu- 
lar challenge is the need to structure and integrate 
statistical modeling within the systems engineering 
process to support decision-making aimed at obtain- 
ing a sufficient and cost effective state of knowledge 
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about future system reliability. This implies that 
judgemental, as well as objective, data should be 
collected responsibly and used formally. 

This paper aims to survey and review the use of 
subjective expert judgement methods to assess reli- 
ability in the design process. We have deliberately 
chosen to interpret the scope of these terms in a 
relatively broad fashion. Thus "expert judgement" 
refers to any structured method of acquiring knowl- 
edge from experts; "reliability" covers the broader 
issues of reliability, availability and maintainability 
(RAM); and the "design process" is considered to 
include within its scope a consideration of how the 
system is to be manufactured, how users will in- 
teract with it and how it will be maintained. More 
specifically, since reliability measures are usually ex- 
pressed in probabilistic terms, we consider the use of 
expert judgement to structure probabilistic models 
and to quantify uncertainties in the development of 
a reliable design. 

The standard definition of reliability, "the abil- 
ity of a system to perform a required function un- 
der stated conditions for a stated period of time" 
[70], naturally translates into a probability measure. 
While empirical reliability can only be properly as- 
sessed after a system is in use, there is a need to fore- 
cast reliability during the design process to support 
analysis aimed at improving reliability. Davis [33] 
supported the definition found in [23] that "reliabil- 
ity is failure mode avoidance." We are sympathetic 
to this view since identifying and mitigating influ- 
ential critical failure modes will cause reliability to 
improve. However, we also believe that probabilistic 
models have an important role to play in support- 
ing design decisions since they allow data integration 
and assist prioritization. 

Reliability is a recognized element of systems en- 
gineering and systems design. However, it is worth 
recognizing from the outset how difficult it is to talk 
about the reliability of a system. In part the dif- 
ficulty has to do with ambiguity of any reliability 
metric. In modern systems engineering the practice 
of requirements setting should, if carried out well, 
result in a coherent set of reliability requirements ex- 
pressed in terms of well-defined RAM metrics. Hence 
good engineering-management practice should en- 
sure that there is little ambiguity in the expression of 
reliability requirements. More difficult though is the 
uncertainty around the circumstances under which 
those requirements are to be met. The reliability of 
a system is ultimately determined by a combination 



of factors. Simplistically, we may think of the relia- 
bility of a specific system as being determined by the 
detailed design reliability as modulated by induced 
unreliabilities coming from the manufacturing pro- 
cess, from the users, from maintenance and from 
modifications. Simplistically, detailed design relia- 
bility gives the maximum potential reliability which 
manufacturing errors, poor usage and poor mainte- 
nance will typically reduce, while changes or mod- 
ifications introduced as a result of experience with 
the equipment will improve the reliability, that is 

overall reliability = designed reliability 

— production unreliability 

— usage unreliability 

— maintenance unreliability 

+ changes reliability. 

More compactly, we could write a chosen reliability 
measure r as 

r = r(d,p, u, m, c), 

where d,p, u, m and c represent the choices made for 
design, production, usage, maintenance and changes. 
Inasmuch as systems engineering is about making 
trade-offs between different aspects of the system, 
the major focus for expert judgement techniques in 
support of reliability has to be to explore the be- 
havior of, and even quantify, the above conceptual 
function in some way. 

The existing expert judgement literature is a start- 
ing point for elicitation problems in engineering de- 
sign, but it needs to be extended to cope with the 
unique problems encountered. This is one of the mo- 
tivations for the present paper. In discussing the 
ways in which expert judgement methods are adopted 
to assess uncertainties in the design process we shall 
consider both the academic and foundational as- 
pects as well as the typical business context so that 
we can gain an understanding of why simpler meth- 
ods are not replaced in practice by better founded 
methods. 

The paper is structured as follows. After describ- 
ing the systems engineering life cycle phases, we ex- 
amine the role of the stakeholders within key mar- 
kets and their influence on reliability modeling in- 
tentions. Summaries of the existing literature in elic- 
itation are woven into a discussion of the issues that 
arise during model structuring, instantiation and 
updating across the systems engineering process. We 
conclude by suggesting areas in need of further re- 
search. 
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2. SYSTEMS ENGINEERING AND 
DESIGN PHASES 

Systems engineering is described in the NASA Sys- 
tems Engineering Handbook [132] as 

... a robust approach to the design, cre- 
ation, and operation of systems. In simple 
terms, the approach consists of identifi- 
cation and quantification of system goals, 
creation of alternative system design con- 
cepts, performance of design trades, selec- 
tion and implementation of the best de- 
sign, verification that the design is prop- 
erly built and integrated, and post-imple- 
mentation assessment of how well the sys- 
tem meets (or met) the goals. 

Reliability is regarded as an important specialism 
that supplies expertise to the systems engineering 
process [14, 15]. However, the nature of reliability 
knowledge and the demands placed on its practi- 
tioners changes considerably through the systems 
engineering process. It is therefore useful to consider 
the main stages in the design process. 

2.1 Life Cycle Phases 

The phases described here, based on the most re- 
cent international standards [69], are generic, but 
descriptions of design phases vary in the literature 
[14]. 

• Concept and definition. Requirements definition 
is the generation of technical design constraints 
on the system. Some of these will be derived from 
information about user demands or expected user 
wishes, while others will be there to ensure feasi- 
bility of the design. Trade-off studies are carried 
out in order to achieve cost-effectiveness and feasi- 
bility. Finally, initial life cycle costing studies will 
be made. 

• Design and development. The system architecture 
is specified in detail, hard- and software will be 



built, tested and refined, leading where necessary 
to adjustments of the specification. Verification 
and validation of subsystem integration is car- 
ried out: verification ensures that subsystems in- 
terfaces conform to design specifications and val- 
idation ensures that the integrated systems fulfill 
their intended function. Maintainability analysis 
will be carried out and end of life disposal will be 
considered. 

• Manufacturing and installation. Hardware will be 
produced and software will be replicated. There is 
an emphasis on process control, although further 
product verification and validation will take place. 
Field trials may be used as a final check on system 
performance. 

• Operation and maintenance. The system in use 
should be monitored for performance. Maintenance 
also provides clues as to system performance and 
can be adjusted where necessary. 

• Disposal. Depending on the regulatory context, 
the system may be destroyed, dumped or disman- 
tled. Increasingly there is pressure from regula- 
tory authorities for reuse of equipment subsys- 
tems, so this stage is by no means the end for 
the system components. 

Figure 1 illustrates the relationships between the 
different systems engineering phases. Prior to opera- 
tion, reliability estimates forecast true performance 
and will encompass the uncertainties in future de- 
cisions. As system-specific observations are collated 
during development and manufacture, some uncer- 
tainties should be resolved and reflected in revised 
estimates. This is shown schematically in Figure 2. 

Feedback loops exist both within and between pha- 
ses, reflecting the analogy with a control system. 
Figures 3-6 show more detailed activities with each 
flow chart capturing the cyclic nature of the pro- 
cess to refine the system design based on assess- 
ment against reliability requirements. Information 
from subsequent phases should be fed back to ear- 
lier phases with a view to modifying the current de- 
sign, if required, but also to inform processes and 
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Fig. 2. Decreasing uncertainty in future system reliability. 



procedures that will impact later generations. How- 
ever, feeding data backward is only possible when 
the systems engineering phases overlap. Hence much 
of the data being fed back is judgemental in nature. 
We shall return to these flow charts later when we 
discuss issues relating to the role of elicitation. 

2.2 Stakeholders in System Design 

As mentioned above, these phases are generic and 
hence relevant to the markets for consumer, indus- 
trial and military systems. However, there may be 
differences in the nature of reliability knowledge and 
modeling within each market, and we explore this 
further through consideration of the key stakehold- 
ers with interests in following the reliability assess- 
ment of a new system, namely requirements specifi- 
cation team, design team, component manufacturer, 
lead manufacturer, sellers, regulators, end users, gen- 
eral public, maintainers and disposers/recyclers. 

These parties can be classed within one of four 
groups: client, manufacturer, regulator and public. 
These stakeholders can, and often do, take differ- 
ent viewpoints about the reliability of the system 
and about the relevance of data. For example, Ta- 
ble 1 captures the respective roles of the groups 
and aims to illustrate two key points. First, that 
different stakeholders may have different modeling 
intentions with, for example, manufacturers using 
models to measure reliability to support decisions 
about accommodation of failure modes and improve- 
ment activities, while clients may use models to sup- 



port negotiation with manufacturers. Second, that 
during such negotiations different stakeholders may 
be using the same data to support different sides 
of a decision. This latter situation mirrors a sim- 
ilar situation in probabilistic risk assessment and 
indeed areas where different parties are asked to 
adopt a common view of uncertainties. This was 
Cooke's motivation for the notion of "rational con- 
sensus" [27] . See also the extensive literature on risk 
communication and public perceptions, for example, 
[54, 127, 128, 133, 136]. 

3. ELICITATION IN RISK AND RELIABILITY 

Subjective expert judgement has a very impor- 
tant role to play in assessing uncertainties in the 
design process, with many of the stakeholders iden- 
tified above contributing their expertise. However, 
the emphasis is somewhat different from the role 
that expert judgement has in other areas — most no- 
tably in probabilistic risk assessment (PRA). Much 
of the modern academic literature on expert judge- 
ment has emerged from the need for structured sub- 
jective assessments in PRA. The key issues that 
emerge from this literature are reviewed. 

3.1 Roles within Elicitation 

In principle there are three distinct roles: 

• Decision-maker: This person is the problem owner, 
who is responsible for signing off on a decision and 
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Fig. 3. Concept and definition flow chart. 



wishes to be informed about relevant uncertain- 
ties by appropriate experts. 

• Expert: This person is identified as a domain ex- 
pert and contributes his or her own assessment on 
the events of interest. 

• Analyst: This person is responsible for identifying 
experts and events of interest, and writing the as- 
sessment and combination schemes. 

It should be noted that [17] also distinguishes the 
role of advisor- expert who essentially plays a role 



somewhere between the above players, by support- 
ing, for example, the selection of experts and elic- 
itation questions. The distinction between roles is 
valuable because some schemes do not recognize the 
different roles of these players. Bayesian schemes in 
particular often merge the role of expert and ana- 
lyst by requiring that the analyst play the role of 
meta expert, by providing a prior that the expert 
data will subsequently update and/or by providing 
the likelihood function for the expert data. 
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A distinction between the three roles defined above 
would seem to be important, for example, in pub- 
lic sector decision-making where there is a need for 
transparency. Even in the private sector there is a 
benefit to be gained from transparency and a clear 
division of roles. However, it clearly also imposes a 
cost burden, for example, due to the degree of spe- 
cialism involved, and may therefore be less appro- 
priate in some contexts. 



3.2 Probability Elicitation Methods and 
Processes 

Research in experimental psychology has demon- 
strated that accurate subjective probabilities are un- 
obtainable by simply asking someone to provide a 
probability number; therefore an elicitation process 
is required [75, 103, 104]. Much of the research in 
elicitation is concerned with minimizing bias, which 
can result from a variety of causes. Four standard 
forms of bias are: motivational, which concerns the 
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situation where the expert has an interest in a par- 
ticular value for the parameter being assessed; cog- 
nitive, which can result from incoherently basing an 
assessment on a number of calculations; anchoring, 
which exists when assessments are derived by an ex- 
pert from adjusting previous assessments; and avail- 
ability, which concerns assigning higher likelihoods 
to events that are linked to more memorable histor- 
ical events. 



Clemen and Winkler [24] gave an overview of the 
state of the art with particular emphasis on risk 
analysis applications. O'Hagan and co-workers wrote 
a series of papers which probably encompass the 
most recent generally applicable work on elicitation 
[53, 72, 113, 114, 115]. An overview of the uses of 
expert judgement in engineering applications was 
given by Ayyub [7], although this work covers also 
aspects of fuzzy representations (about which the 
reader can find a review by Cooke [29] of a previous 
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book by Ayyub and a reply by the author). Fitting 
closely to the theme of this paper is the work of 
Booker and McNamara [17], which presented a very 
nice description of the process of determining prob- 
lems for expert judgement, selecting experts and the 
problems caused by possible biases. Cooke [27] gave 
an historical account of elicitation and also provided 
a number of different models for the combination of 
expert probability assessments, including the classi- 



cal method which has been quite successful in PRA 
applications; see [76] for a list of applications. 

Expert judgement methods that draw on PRA are 
relevant to work on engineering design problems but 
are limited in two important ways. First, in the engi- 
neering design process there is a greater need to have 
experts define the problem structure, so the qual- 
itative phase of model building is relatively more 
important than it has historically been for PRA de- 
cision support. See, for example, Walls and Quigley 
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Table 1 

Stakeholder uses of reliability assessment 



Manufacturer Client Regulator and Public 

Acceptance of requirements Specification of requirements 

Design of reliability program Acceptance of reliability program 

Proof of meeting targets Assurance of meeting targets Safety case 

Planned maintenance specification Spares ordering 



[146], who proposed an elicitation process to support 
reliability growth modeling. Second, there is a big 
difference in the way that events can be described: 
PRA elicitation is generally for very precisely de- 
fined events, while in engineering design problems 
it is much more difficult to describe events precisely 
because of extra uncertainty caused by the effect of 
future decision-making that surrounds the system 
and its use. In both cases there will always be un- 
specified states of the world for which the expert has 
to "fold in" his uncertainty. However, the degree of 
influence of future decision-making in the reliability 
engineering context is such that it becomes useful to 
model this explicitly in order to support the design 
process. 

These concepts are represented in the flow charts 
shown in Figures 3-6. For example, in the concept 
and definition phase we distinguish between the elic- 
itation of the qualitative failure modes and the quan- 
titative reliability estimates. Furthermore, sensitiv- 
ity analysis represents the exploration of future un- 
certainties using engineering judgements as inputs. 
In subsequent phases, previous judgements will be 
revisited and revised in light of observations from 
analysis and test tasks. 

3.3 Modeling Uncertainty in Design 

While systems engineers may well think in the 
holistic framework outlined in Section 2 and cap- 
tured within the flow charts in Figures 3-6, the 
statistical modeling generally applied is usually fo- 
cussed on tightly defined and highly specific issues 
within life cycle phases. The support that these tools 
give engineers is therefore fairly constrained. 

Uncertainty is fundamental to systems modeling 
and is worthy of further comment. Various authors 
have given overviews of different "types" of uncer- 
tainty. The classification given in [10] discusses 
aleatory and epistemic uncertainty, and suggests that 
the important distinction between them is model- 
dependent, as epistemic uncertainties are uncertain- 
ties that we wish to capture and adjust within a 



model through learning, whereas aleatory uncertain- 
ties are not adjusted within a model. The uncer- 
tainty in an abstract parameter or in a model type 
can be given an interpretation, according to [10], 
only in terms of the uncertainty it induces in observ- 
able outcomes. The above types of uncertainty can 
be quantified by subjective probability. By contrast, 
[10] mentions ambiguity (which is best resolved by 
careful definition during qualitative problem struc- 
turing rather than mathematical modeling) and vo- 
litional uncertainty (an individual's own uncertainty 
in his own actions), which cannot be measured by 
the tools of subjective probability, although it can 
be assessed by an independent observer. 

In the context of engineering design it seems use- 
ful to define another kind of uncertainty — that of 
tolerance uncertainty. This represents the variation 
expected in a parameter across the design envelope. 
For example, one might be interested in the failure 
rate (assumed constant) associated with a piece of 
equipment. Since that failure rate will depend on 
various design, construction, environmental and us- 
age factors assumed in the definition of the design 
envelope, we can write it as A(e), where e represents 
chosen factors. 

Assuming that e is constrained to lie in a de- 
sign envelope E, the tolerance uncertainty associ- 
ated with A and E is the interval 

min A(e), max A(e) . 

It does not always make sense to place a probability 
distribution on E. This is because some variables are 
subject to choices made by the designer, the manu- 
facturer, the user or the maintainer. (At the simplest 
level this could be a mandatory rule to the user to 
avoid certain conditions that are known to induce 
failure.) 

System engineering places great emphasis on mak- 
ing trade-offs between different aspects of the sys- 
tem — cost, functionality, reliability and so forth. From 
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a reliability point of view, one of the principal ways 
in which this trade is carried out is by changing 
the design (which may have cost and/or function- 
ality implications), by specifying changes to the de- 
sign envelope (i.e., restricting the way in which the 
system can be used), by specifying changes to the 
maintenance regime or by making changes and mod- 
ifications to the system. Since many of these imple- 
mentations occur after the design process has been 
(notionally) completed, to make the trade-off in the 
best way possible it is necessary to know how much 
the tolerance uncertainty can be controlled by changes 
made later. 

4. ELICITATION WITHIN (RELIABILITY) 
MODELING PHASES 

Reliability models, as is the case with many other 
model classes, are developed and applied through 
three modeling phases. These phases can occur at 
any point within the systems engineering process, 
depending on the question at hand. The conceptual 
phase is one of model structuring in which a qualita- 
tive form is given to the model. That is followed by 
an initial quantification stage and then by a revision 
stage in which increasing quantities of real system 
data can be utilized. 

In all three modeling phases there is a role for 
expert judgement. In the first, the primary role is in 
model selection and initial qualitative specification. 
In the second, expert judgement has the key task of 
providing the initial quantitative estimates. In the 
final phase, expert judgement plays an important 
role in interpreting the relevance of available data. 

We discuss below the roles that expert judgement 
plays in these modeling phases, but first we discuss 
frameworks described in the literature that aim to 
stretch across both modeling and systems engineer- 
ing phases. 

4.1 Meta Modeling Frameworks 

The programs PREDICT (Performance and Reli- 
ability Evaluation with Diverse Information Combi- 
nation and Tracking) [83, 84] and REMM (Reliabil- 
ity Enhancement Methodology and Modelling) [148] 
are two modeling frameworks used to estimate reli- 
ability throughout the systems engineering phases. 
Both models begin with a problem structuring stage, 
which consists of eliciting a graphical representa- 
tion of the relationships between relevant engineer- 
ing concerns or potential failure modes and the re- 
liability experienced by the system. These graphs 



form the structure of the stochastic model; this es- 
sentially represents a meta model within which stan- 
dard probability models can be integrated. The 
stochastic model is populated with either expert judge- 
ment or relevant historical data. Thus these approaches 
provide one unified decision support framework through- 
out the system design and development, supporting 
sensitivity analysis as well as credible intervals of 
the uncertainty in the reliability. Furthermore, as 
system-specific data become available through anal- 
ysis and test, the model parameters can be updated. 

Such frameworks rely on expert judgement for the 
reliability assessment at a system level and aim to 
overcome the limitations of traditional approaches, 
which according to [62] and [73] tend to provide 
overly optimistic estimates of reliability due to their 
failure to account for major sources of early failures 
such as design defects, process flaws and human er- 
ror. 

We move now to a discussion of the three modeling 
stages. 

4.2 Qualitative Model Structuring 

We distinguish between four types of structuring 
activity that play a role within the design and de- 
velopment phases: capturing and defining require- 
ments, eliciting failure modes, selecting model for- 
mulations and robust design. 

4.2.1 Requirements capture and concept definition. 
Reliability requirements drive the modeling process 
as shown in Figure 3 because they inform targets 
against which reliability estimates will be compared. 
Reliability requirements are expressed in a fairly 
standard form in most engineering design projects. 
O'Connor [111] provided guidelines of what should 
and should not be included. Since reliability require- 
ments can drive significant costs, they should be mo- 
tivated and ideally derived from user demands about 
the system functionality and from an understanding 
of what the current technology levels can support. 
However, such a derivation requires many assump- 
tions about the pattern of use and the environment 
in which that will take place. 

While it is acknowledged by designers of hard- 
ware systems that the customer's requirements of 
the item are of paramount importance [71, 111], 
there are few recent published articles compared with 
requirements setting for software systems [108]. In 
our experience with hardware systems it seems that 
systematic modeling is not performed in the deriva- 
tion of requirements, and historical precedent (i.e., 
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the requirements that were set for the last version) 
is used as an alternative. 

A focus of research within the software community 
has been the elicitation, analysis and management 
of system requirements. Two dominant, but comple- 
mentary, methods for analysis are goal oriented [35] 
and use case analysis [2]. The former is concerned 
with eliciting system constraints, while the latter is 
concerned with eliciting system behavior [151]. Pro- 
cesses have been proposed to support creative think- 
ing about requirements [98] and capture stakeholder 
views [32, 39]. Comprehensive rigorous processes for 
requirements definition have been suggested, for ex- 
ample, in [158] and [6]. 

A further approach of value in reliability is QFD 
[130, 139], which provides a broad-brush, semiquan- 
titative assessment of the relationship between those 
factors that can be controlled by engineering design 
and those characteristics valued by users. 

A study of requirements changes throughout a 
project is given in [97]. The problem of so-called re- 
quirements creep can be endemic, and modeling the 
development of requirements throughout a project 
is not easy. Within the context of software devel- 
opment Stallinger and Griinbacher [137] explored 
modeling this with system dynamics; see also [63, 
64, 65, 66]. 

4.2.2 Eliciting failure modes. Qualitative reliabil- 
ity modeling is routinely conducted during concept 
design to elicit and structure the failure modes that 
are likely to drive the (un) reliability. Methods used 
include failure mode and effects analysis (FMEA) 
[20], which obtain an understanding of the ways in 
which different types of failure can occur, while haz- 
ard analysis [85], top-level event tree (ETA) and 
fault tree (FTA) analyses [4] can give an indication 
of how the system functions. These types of analysis 
are prospective and can be extended in later stages 
when more information is known about the system. 
In contrast, root cause analysis [38] provides a pro- 
cess for retrospective forensic analysis of observed 
events to identify the drivers so that lessons learnt 
can inform use and maintenance of the operational 
system; however, such data can also inform design 
modifications to a new generation. 

Elicitation of subjective judgement plays a pivotal 
role in such qualitative analysis with all methods us- 
ing some semistructured process to gather and or- 
ganize data. For example, FMEA aims to develop 
a model of the causes, modes and effects of failures 



as they impact the system through different levels 
of indenture. Conceptually, FMEA aims to popu- 
late an exhaustive sample space of potential events 
that could impact reliability from a design or process 
perspective. The approach to elicitation is to frame 
questions either in terms of functionality, architec- 
ture or process and systematically think through 
each level in a bottom-up (i.e., from parts to sys- 
tem) manner. In contrast, FTA assumes a top-down 
approach to elicitation. Critical events, or so-called 
top events, are defined in terms of departures from 
requirements. For any system there may be one or 
more top events. For each, a tree is constructed by 
drilling down the sequence of events that could cause 
or exacerbate a failure. Fault trees can accommo- 
date failures with more than one cause, while FMEA 
cannot. Hazard analysis represents a structured elic- 
itation of potential operational hazards to a system 
during installation, production and decommission- 
ing using a set of prescribed keywords to manage 
the content analysis. 

The principles of the aforementioned approaches 
aspire to be systematic; however, there has been crit- 
icism of their reported implementation. The FMEA 
has been criticized within the aerospace industry 
[100] because it has been implemented too late in the 
product development process and in a manner that 
does not allow information to be fed back to inform 
the product design. White [153] criticized the gen- 
eral approach to use of the suite of standard meth- 
ods, claiming the manner of their use is reductionist, 
and proposed that a systems approach that exploits 
multiple partial views and explores the problem en- 
vironment would result in richer information. 

It is not known how valid these criticisms are for 
all industries. There is evidence that these methods 
are being used effectively to influence system de- 
sign of, for example, space systems [52], but there 
is undoubtedly a lack of reporting in the literature 
by manufacturers and there is no known scientific 
survey of the effectiveness and efficiency of their 
application. Anecdotal evidence suggests that there 
are industry effects, for example, consumer products 
that embrace elicitation of failure modes as part of 
their quality processes [33, 119] while others largely 
remain accountants of failure modes. 

Recent research related to these qualitative meth- 
ods has been dominated by two avenues: (1) au- 
tomation of knowledge capture and representation 
[93] and (2) quantitative prioritization rules [21] and 
computational algorithms [5, 43]. An exception has 
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been the work described in [17, 62, 83, 84, 147, 
148] which developed elicitation processes that em- 
brace a systems approach and the scientific prin- 
ciples of structured judgement [27] fundamental to 
sound data collection. They aim to elicit the core 
concerns held by all relevant stakeholders during 
early design through a sequence of semistructured 
interviews using simple mapping [41]. Discussions 
are triggered by focussing on the changes between 
generations of systems designs in terms of technol- 
ogy, process and use. The maps developed capture 
the reasoning trail that links to formal records within 
a defined failure taxonomy [50] that can be revisited 
and updated as the design evolves. This approach 
is captured within Figures 3-6 through the initial 
activity to elicit failure modes in early design and 
subsequent elicitation exercises in later phases. 

4.2.3 Selecting and structuring models. Surpris- 
ingly little has been written about the qualitative 
structuring process for reliability models. At a prac- 
titioner level, guidance on the selection of tools to 
match modeling objectives exists within international 
and company standards. At a more abstract level, 
the principles of requisite modeling can be applied 
[120, 121, 123]. 

The standard systems reliability models are all 
based essentially on cause and effect, and include 
FTA, ETA, reliability block diagrams (RBD) and 
(semi) Markov modeling [16, 126]. These methods 
provide subtly different, but related, representations 
of the system. Typically used in a hierarchical way, 
they can be used at different levels of system inden- 
ture, enabling the reliability engineer to fill in more 
detail, as it becomes known. Keller and Modarres 
[81] provided details of their early history. See also 
[31, 99, 143]. 

There is often a perception that there exist "cor- 
rect" models which can be found by the applica- 
tion of appropriate quality control. However, there 
are important choices to be made about the model 
scope. How deep (or detailed) should the model be? 
What failure events should be considered? Which 
environments, or failure scenarios, should be con- 
sidered? These questions are the subject of expert 
judgement, albeit usually unstructured. 

Graphical based methods such as FTA and RBD 
are popular during design because they provide use- 
ful representations of the system, linking probabilis- 
tic assessments with physical structure and func- 
tionality. However, the frameworks are not without 



shortcomings and recent research has proposed the 
use of Bayesian belief networks (BBN) as a more 
flexible substitute [134] . The BBNs can be constructed 
to directly map onto potential engineering decisions 
[11]; they can be constructed to capture temporal 
effects [18]; they can capture common cause failure 
modes [140]; they can capture anticipated changes 
in reliability due to manufacturing and operational 
demands [152]; and, finally, BBNs can be used to fa- 
cilitate decision-making subject to multiple criteria 
[47]. This is important during concept design when 
the strengths and weaknesses of design options are 
traded off. Several case studies describe the appli- 
cation of BBNs to reliability modeling of complex 
systems. See, for example, [19, 46, 107, 159]. Leish- 
man and McNamara [94] described an ethnographic 
approach to qualitatively structuring a reliability 
model. Such an approach makes use of in-depth in- 
terviews with relevant participants. The data ac- 
quired through the interviewing processes are struc- 
tured via Bayesian networks; see also [155]. 

4.2.4 Robust design. The stress-strength relation- 
ship is core to reliability engineering. Conventional 
modelling, as discussed above, provides estimates of 
whether the system design possesses the strength to 
meet the nominal stresses within the specified oper- 
ational environment. However, there can be consid- 
erable uncertainty about the actual stresses encoun- 
tered in operation and, hence, analysis to examine 
the robustness of the design to variation in stresses 
is important. 

The concept of robust design [116] is fundamen- 
tal to the quality movement and encompasses the 
work on experimental design and analysis. Condra 
[25], among others, defines reliability as "quality 
through time" and advocates the importance of sta- 
tistical experimental design in reliability improve- 
ment. There are limited reports of its use in prac- 
tical reliability engineering, although see, for exam- 
ple, [33] for its use within the automotive industry. 
Perhaps this is not too surprising since the ability 
to replicate repeated trials is most feasible for those 
systems which will be mass produced. Others have 
discounted the influence of experimental design on 
traditional reliability testing because of the identifi- 
ability problems given the small amount of data rel- 
ative to control parameters [79] . The increasing role 
of simulated experiments may remove such physical 
constraints. 
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Elicitation is required to support not only design 
of experiments, but also specification of standard re- 
liability tests, such as growth development tests and 
production acceptance tests. Again there is little re- 
ported about how this can and should be achieved. 
Exceptions are Condra [25] and Davis [33] , who share 
insight into the identification of the failure modes 
that influence the choice of response variable and 
the semistructured methods used to identify the ex- 
planatory variables and their experimental settings. 

Methods for elicitation of stakeholder judgements 
abound in the quality literature. A useful summary 
is given in [78], which summarizes 100 methods by 
purpose, when to use, how to use and benefits, and 
provides an example. The methods are classified into: 
management methods, analytical methods, idea gen- 
eration, data collection, analysis and display. While 
the scope is comprehensive, all tools are treated as 
independent entities. 

A recent special issue of the journal Quality and 
Reliability Engineering International (April 2005) 
provided some interesting reviews of the role of six 
sigma in the 21st century and the key interaction 
between the softer and harder aspects of statisti- 
cal modeling within industry. Hahn [60] emphasized 
the key goal of designing products with long life and 
high reliability, and identified the need to include re- 
liability modeling within the six sigma toolkit. The 
use of six sigma through the life cycle of an auto- 
mated decision support system is discussed in [117], 
again highlighting the synergies with systems engi- 
neering, although broadening the issues beyond the 
engineering to include service processes. Anderson- 
Cook, Patterson and Hoerl [3] described graduate 
training with special emphasis on the role of struc- 
tured problem solving within a program that aims to 
develop the facilitation skills of statisticians within 
a project life cycle. 

Experiments, tests and statistical quality control 
are encompassed by the generic term "task" used 
in Figures 3-5. We propose that their value should 
be assessed during reliability planning and the data 
from their implementation should be used to revise 
modeling estimates, which we shall discuss further 
later. 

4.3 Initial Quantification 

Most of the key probabilistic models used in re- 
liability are quantified through mixtures of expert 
judgement and generic, or other, surrogate data. 



4.3.1 Reliability models. Before discussing the var- 
ious techniques used for quantification, we give a 
brief overview of some of the models used, arranged 
according to the systems engineering phases. Note, 
however, that there is no rigid restriction of mod- 
els to the phases we have associated them with, 
as preliminary studies are frequently carried out in 
earlier phases and detailed later. For example, de- 
cisions about production, maintenance and opera- 
tional support will tend to be made in development 
using information about the failure modes elicited in 
design. In turn, tasks included in the reliability plan 
and used to revise estimates after implementation 
include the engineering analysis and test methods 
discussed below. 

Design and development. Concept design is char- 
acterized by the need to make trade-off decisions: 
cost against functionality, weight against strength 
and so on. In principle reliability requirements should 
play a part in these trade-offs too, with model pre- 
dictions being inputs to the game. However, although 
there is a wide literature on reliability optimization 
(see [91] for a survey), this literature generally makes 
the assumption that the system structure and the 
reliability characteristics of parts are quite well de- 
fined. This is not usually the case within early de- 
sign: hence, the difficulties of predicting future reli- 
ability quantitatively are such that reliability tends 
not to play a major role in the trade-off discussion 
[14]. 

In addition to the systems reliability models listed 
in Section 4.2.3 and widely used in practice, many 
probability models have been reported in the liter- 
ature. For example, Singpurwalla [135] provided a 
taxonomy of stochastic models that are useful for 
reliability modeling in dynamic environments. This 
is important because not only are there uncertain- 
ties in the operational stresses under given condi- 
tions, but there can be anticipated variation on the 
demand patterns. Renewal processes are commonly 
used [8], although other people adapt FTA to cap- 
ture a dynamic environment [1]. 

Physical failure modeling is used extensively within 
simulation during detailed design and development. 
Mathewson et al. [101] provided a review of simula- 
tion tools used within the design process for predic- 
tive inference as well as for supporting optimal de- 
sign decisions. The majority of these models make 
extensive use of component level physical models 
which are adjusted by empirical data for calibra- 
tion. The main criticism of these models is limited 
focus of one failure mechanism per model [13]. 
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Engineering testing remains a staple part of re- 
liability programs, but growth testing is now more 
prevalent than demonstration testing. Many tests 
will be conducted under accelerated conditions (see 
[109, 110] for a bibliography of accelerated test plans) 
and they generate few observations. Consequently, 
research in this field is dominated by Bayesian ap- 
proaches; see [44, 58, 77, 124, 125]. A notable excep- 
tion is modeling with covariates [40]. 

Within civil engineering, expert judgement is now 
commonly used to incorporate assessments of un- 
certainties into design decision-making. In this area 
much of the design decision-making is in the context 
of the management and upgrading of existing assets. 
See [34] for a discussion of a performance-based as- 
set management system for flood defenses which is 
driven by expert judgements. The Dutch are going 
through a process of reevaluating their risk crite- 
ria for dikes, and much of the technical preparatory 
work has involved the use of expert judgement to as- 
sess uncertainties in the physical models of dike fail- 
ure [30, 145]. Similarly, expert judgement has been 
used to quantify physical models that describe the 
behavior of buildings [36]. 

Manufacture and installation. There are a few 
unique systems where active design continues into 
manufacturing — mainly in space systems and civil 
engineering structures. However, for most systems, 
the emphasis in the manufacturing phase is on pro- 
duction quality control. For mass production, estab- 
lished methods of statistical process control can be 
used for the key failure modes elicited during de- 
sign and development. Systems with a low volume 
of assembly and many manual processes may rely on 
product screening [67]. For such systems, which in- 
clude aerospace and military systems, early produc- 
tion models can also be used in prerelease testing 
to give either the manufacturer or the client confi- 
dence in the reliability of the product. The design 
and analysis of these trials possess the same data 
challenges as reliability demonstration tests earlier 
in development. 

Operation and maintenance. Several authors have 
acknowledged the role of expert judgement within 
maintenance modeling, notably, Lu, Wang and Chris- 
ter [96], who combined subjective judgement about 
preventive maintenance with failure records to sup- 
port delay time modeling of plants, and Kunttu and 
Kortelainen [90], who presented a case study using 
expert judgement within a Poisson model to sup- 
port maintenance decisions, van Noortwijk et al. 



[142] proposed a maintenance optimization model 
and used a linear pool to combine expert opinion to 
assess the lifetime distribution. See also [149] for a 
review of subjective estimation in maintenance mod- 
eling. 

Murthy, Solem and Roren [106] provided a com- 
prehensive review of warranty modeling, and Kleyner 
and Sandborn [86] provided a warranty model based 
on Weibull and exponential models where the pa- 
rameters are estimated by data using stochastic sim- 
ulation to overcome mathematical intractability. Ward 
and Christer [150] acknowledged the need for ex- 
pert assessment for warranty modeling. Examples 
of Bayesian approaches include [68, 122, 138]. 

Real-time condition monitoring is an important 
tool in maintenance decision-making. When mod- 
eled, a degradation signal can be used to estimate 
residual life. The data obtained through measur- 
ing aspects of the degradation process of each of 
the system's components can be used as concomi- 
tant variables in a proportional hazards model [144]. 
Alternative approaches include [87], which uses a 
Markov chain to capture the degradation process. 
The usefulness of engineering judgement for inter- 
preting such data is evident and, as such, Bayesian 
methods potentially play an important role in this 
part of the cycle (see, e.g., [55]). 

A variety of problems are associated with civil 
engineering structures during the operations phase. 
Assessments of the times required to evacuate a dike 
ring are made in [9] , while the time required to safely 
close the movable barriers in a dike ring structure 
is modeled in [141]. Degradation process modeling 
is very important, particularly where inspection or 
condition monitoring may be costly, such as with 
sewers [88], or where the underlying processes are 
difficult to predict, such as with coastal erosion [61]. 

4.3.2 Expert judgement collection. All of the above 
models require instantiation. Typically they are quan- 
tified through expert judgement using a similar set 
of techniques that we now discuss. 

The initial quantification of reliability models is, 
in practice, frequently an unstructured search through 
historical systems data and generic data bases to 
find "ball park" parameter estimates. 

A variety of problems are encountered here: 

• The combination of opinions of different experts. 

• The transformation of combined data into assess- 
ments of parameters within a model. 

• The combination of expert and generic system 
data. 
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4.3.3 Expert combination. When more than one 
expert is contributing assessments about a quantity 
of interest, then the analyst has the problem of com- 
bining them in some way. There are, broadly, two 
approaches to this. The first is to pool the data — 
generally through the use of a linear pool. The sec- 
ond is to regard the expert assessments as observa- 
tions and use a Bayesian model to combine them. 
An even simpler and older method is that of paired 
comparisons (see the discussion in [27]). 

Pooling. Key issues of consideration with pool- 
ing schemes concern choosing which properties to 
preserve as we switch between the individual ex- 
perts and the aggregated pooled expert. For exam- 
ple, assessments which are statistically independent 
for individuals are not necessarily independent for 
the pooled expert and updating the pooled expert 
through Bayes' theorem does not necessarily provide 
the same distribution as aggregating all the updated 
individual distributions (see, e.g., [56] or [45]). 

We refer to Cooke [27] for a discussion of the dif- 
ferent types of pooling, but note that he argued 
strongly for a linear pooling of expert distributions. 
Each expert is assigned a weight which is used to 
form a weighted combination of expert distributions. 
The weight should not be interpreted as a proba- 
bility, as one cannot associate the experts with a 
collection of exclusive and exhaustive events. The 
choice of weight is difficult to justify. While a com- 
mon pragmatic approach is to use equal weightings 
of experts (see the description of NUREG 1150 given 
in [80]), Cooke has argued that performance-based 
weighting is more effective and better meets impor- 
tant underlying principles including empirical con- 
trol. For more details see [10, 27], and for a moment- 
based approach see [156]. 

Bayesian combination. The main difference in phi- 
losophy between pooling and Bayesian combination 
is that the latter treats expert assessments as if they 
are observations. Hence there has to be a specifica- 
tion of the likelihood function of the expert data. 
This feature, which raises serious problems for the 
analyst, is most clearly visible in the multivariate 
normal model used by Mosleh and Apostolakis [105]. 
Here the expert assessment is modeled as being equal 
to the "true" value of the parameter of interest, plus 
a normally distributed error, which is considered to 
be independent of the true value. In principle, the 
analyst then has to specify the multivariate distri- 
bution of expert errors: means (which can be inter- 
preted as expert biases), variances (which can be 



interpreted as degree of certainty) and covariances 
(which reflect the degree of correlation of the group 
of experts). While these are all quantities of some 
interest in assessing expert opinions, it is not clear 
on what basis the analyst can assess them without 
being in a superior position of expertise to that of 
the experts. 

There are now many other Bayesian methods avail- 
able, especially techniques that incorporate Bayesian 
networks, which have essentially the same require- 
ment that the analyst develop a likelihood function 
for expert data. See [24] for a discussion of a variety 
of such models. The difficulty of structuring such a 
model depends of course on the details of the model 
and the context in which it is used. For example, see 
[129] , which takes assessments of numbers of failures 
to assess parameters of a nonhomogeneous Poisson 
process, and [57], which discusses the possible ad- 
vantages of a Bayes linear framework. 

4.3.4 Transformation to parameters and families 
of distributions. Both theoreticians and practition- 
ers can easily forget that many of our favorite model 
parameters, such as failure rate, are not actually ob- 
servable quantities at all, but are simply parameters 
of a model that we want to use to make predictions 
about the future. It has been strongly argued on 
foundational grounds (e.g., the discussion in Chap- 
ter 2 of [10]) that we can only ask for probability 
assessments on observable quantities. Hence there is 
a need to infer from those assessments which prob- 
ability distributions on model parameters are con- 
sistent. This approach was developed by Cooke [28]; 
more algorithms and underlying theory are given in 
[89]. 

Taking a more standard Bayesian perspective, 
Percy [118] discussed the indirect assessment of hy- 
perprior parameters through the direct assessment 
of quantiles of observables whose distribution is a 
prior predictive of the unknown Bayesian model. 
Gutierrez-Pulido, Aguirre- Torres and Christen [59] 
took a similar line, considering both moments and 
quantiles of the time to failure for a system as sources 
of information from which prior distributions can 
be fitted. Such methods could also be applicable to 
other Bayesian contexts where prior distributions on 
lifetime distribution parameters are to be assessed, 
for example, in Bayesian accelerated or proportional 
hazards life modeling [22]. 

In the absence of an assumed class of conditional 
models it becomes much more difficult to 
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family of conditional distributions: in the context of 
decision support, it is necessary to consider families 
of distributions indexed by the decision variables. 
When the decision space is small and discrete, re- 
peated elicitation can be used, but in the case of a 
continuous family this becomes more difficult. We 
are not aware of much work in this area, but it is 
worth mentioning work by Cooke and Jager, who 
expressed the probability of an event in terms of sys- 
tem parameters in a Taylor series [26]. A study by 
Willems et al. [154] used graphical methods to elicit 
conditional quantiles from experts. In the context 
of human reliability analysis, proportional hazards 
type models have been used in which the parame- 
ters are assessed purely through various judgemental 
techniques, such as paired comparison or multicrite- 
ria decision analysis. See [42] in particular and the 
discussion in [10] for other examples. 

4.3.5 The combination of expert and historical data. 
Heritage data for historical systems provide insights 
into the observed reliability of related systems. See, 
for example, Figure 3, which highlights the selection 
of historical data to inform the base reliability of the 
new system. 

Historical data may be obtained from generic data 
bases or company-specific event data bases. Generic 
reliability data are usually based on operating data 
drawn from a variety of sources and mixed together. 
Many generic databases exist; usually they are sector- 
specific. To adapt such generic information to a 
system-specific setting, reliability data bases have 
traditionally used environmental loading factors, such 
as the Military Handbook 217 [37] (hereafter Mil- 
Hdbk-217). 

Mil-Hdbk-217 expresses failure rates for compo- 
nents using so-called 'V factors, which are multi- 
plication factors that depend on environmental or 
usage factors. To determine the appropriate failure 
rate, the analyst simply has to find the correct com- 
ponent description, identify the appropriate envi- 
ronmental or usage factors and then multiply the 
base failure rate by the tt factors given in the ta- 
ble. These numbers are given to a very high degree 
of accuracy and are an attempt to represent the 
dependence of reliability on at least some param- 
eters. Unfortunately, because they do not represent 
the dependence on all the parameters, the accuracy 
given is misleading. By contrast, the IEEE-500 data 
base and others based on similar principles, such 
as OREDA and EIREDA, specify much about the 



system and its operating conditions, but explicitly 
present the remaining variability in the failure rates. 
Fragola [50] called these resources "third generation 
databases." 

Many reliability "predictions" made in practical 
applications seem to be based on an adaptation of 
generic data through expert opinion, rather than 
from a (possibly Bayesian-based) fusion of the two 
forms of data. For example, in practice it is common 
to adjust generic data to make it system-specific — 
typically through the use of failure rate multipli- 
cation factors — but the methods employed are not 
generally supported by clear and transparent expert 
judgement protocols or models. 

A nice example of failure rate adjustment is given 
by Fragola and McFadden's study of failure rates 
for space station units [51], where experts gather 
and combine different generic data estimates. While 
no clear statistical model is used to justify this, it 
is worth noting that the outputs of the process are 
ranges of failure rates. The third generation databases 
described above all provide ranges of failure rates — 
often described using a log-normal distribution on 
the failure rate parameter. About the point esti- 
mates given to great precision in Mil-Hdbk-217 data, 
Fragola [50] wrote, ". . . failure rates came to be looked 
upon as fixed measures of specific equipment, not 
measures of a spectrum of equipment types." How- 
ever, he suggested also that Mil-Hdbk-217 data are 
perfectly usable, as long as they are used in conjunc- 
tion with uncertainty bands to capture this extra 
variation. 

One of the underlying reasons for the overstated 
accuracy of Mil-Hdbk-217 is that it reflects a large 
amount of testing; this is worth reflecting on further, 
because it has more general implications for the re- 
lationship of old to new system data, and for the 
way data changes through the systems engineering 
process. Old data will often be a poor representation 
of prior information because they do not take into 
account the changes made to the system, usage and 
environment. Although the usual asymptotic con- 
vergence properties hold when updating with the 
new system data, this is of little practical signifi- 
cance because the amount of new system experience 
needed for convergence is not available. Hence the 
speed at which the posteriors will adjust to the "cor- 
rect" failure rate will be affected by the degree of cer- 
tainty we had built up for the previous system: The 
more data we had for the previous system, the more 
slowly the posterior will converge to the correct new 
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failure rate. A more appropriate way to model the 
new system reliability is to try to model the change 
expected to the old system reliability, and this is 
something that can only be assessed through expert 
judgement. Often the uncertainty about the effect of 
the change will dominate the information from the 
old prior. 

The REMM model [148] explicitly attempts to 
model such effects by considering failure modes in 
the new and existing systems and subjective assess- 
ments about the way the design changes will affect 
them. A higher level approach using Bayes linear 
methods (based on moments rather than probabili- 
ties) was discussed in [57], where expert assessment 
of the change to MTBF (mean time before failure) 
is proposed. 

Using historical data from company-specific data 
bases can give rise to similar issues already men- 
tioned for generic data. Although data for the pre- 
vious systems manufactured by the company have 
the potential to relate operating experience directly 
to earlier design decisions, hence supporting inter- 
pretation and selection of base events input to the 
reliability model for the new system, there are also 
industry-specific challenges, for example, censoring 
at the expiry of the warranty period for consumer 
products [33]. 

The common problem with using old system data, 
whatever their source, can be expressed succinctly 
using the notation for system reliability used earlier, 

r = r(d,p, u, m, c). 

Suppose the old system data correspond to slightly 
different design, production, usage and maintenance 
patterns. Then the "old" reliability will be 

r = r(d ,p ,u ,m ,c ). 

The uncertainty ranges given in the third genera- 
tion data bases may be seen as an attempt to rep- 
resent credible ranges by changing these parameters 
within a given envelope. It is finally worth remark- 
ing that it is essential for the future utility of these 
data bases that they maintain the ranges inherent 
in each equipment class. Hence it would be wrong to 
start updating the data bases with system-specific 
data using a straightforward application of Bayes' 
theorem, as this will reduce the variance artificially. 



4.4 Revised Quantification with 
System-Specific Data 

Because data are realized from tasks implemented 
in design, development and manufacture, the initial 
reliability estimate can be revised as captured in 
Figures 3-6. We discuss the two options for revision 
of estimates: Bayesian updating and reelicitation. 

As noted above, badly calibrated prior distribu- 
tions and poorly specified stochastic models com- 
promise Bayesian inference. The quality of inference 
obtained through Bayesian updating is contingent 
on both the prior distribution to capture epistemic 
uncertainty and the choice of model to capture the 
aleatory uncertainty. The latter of these acts like a 
lens in which the data are viewed, so even if a mean- 
ingful prior distribution is elicited, the posterior dis- 
tribution may be misleading because this lens may 
filter out observations that would sensibly inform 
the inference if the choice of model were different. 
As such, differences in reelicited prior distributions 
compared with posterior distributions may be due 
to the filtering rather than incoherency expert(s) or 
may be due to a mixture of both. 

The systems engineering process is longitudinal 
and hence offers the opportunity, not only to update 
prior distributions through Bayes' theorem, but also 
to reelicit from a common set of experts. This offers 
the opportunity to validate the choice of model, as 
well as to assess the calibration of the prior distri- 
butions. Furthermore, a learning environment can 
be created by appropriately feeding results back to 
the experts and supporting them in improving their 
ability to specify uncertainty in terms of probability. 

As discussed earlier, the quality of subjective prob- 
abilities from experts depends on both the elici- 
tation methods and the experts' experience. If an 
expert lacks experience, prior distributions will be 
uninformative or misleading, regardless of the elici- 
tation method. Equally, poorly designed elicitation 
processes may degrade the quality of information 
provided from experts. Fischhoff [49] proposed the 
following four necessary conditions to support im- 
proving judgement skills: (1) Abundant practice with 
a set of reasonably homogeneous tasks; (2) clear- 
cut criterion events for outcome feedback; (3) task- 
specific reinforcement; (4) explicit admission of the 
need for learning. There is extensive evidence that 
these criteria are often not achieved in practice [48, 
157]. 

Feedback is crucial for calibrating the expert and 
should be event-specific [48, 49, 157]. In other words, 
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the feedback must be given with respect to assign- 
ing probabilities to particular events and not to the 
ability of the expert to assign probabilities to any 
situation. To increase the effectiveness of feedback in 
terms of learning, conditions that influence the event 
should recur as often as possible [49, 74]. There- 
fore the factors on which the measure is conditioned 
should be as few and as general as possible. 

5. DISCUSSION AND REFLECTIONS ON 
FUTURE DIRECTIONS 

We have attempted to give an overview of expert 
judgement applications within the field of reliabil- 
ity assessment during systems design. In doing so, 
a number of key points have arisen which we now 
revisit to summarize and discuss further. 

We have suggested that the whole systems en- 
gineering design process is akin to a control prob- 
lem. The control feedback loops are, however, driven 
largely not through revisiting decisions in the light 
of newly acquired system data, but through the use 
of expert judgements which assess the likely out- 
come of different system design decisions. Of course, 
in the wider context of new generations of systems, 
there are also feedback loops through the use of rel- 
evant data that reduce uncertainty not only on the 
system's own physical and engineering properties, 
but also on the manner of user interaction. It is also 
worth mentioning that requirements are frequently 
revised in light of experience with previous genera- 
tions of systems, and there is surely a role for statis- 
ticians in influencing the setting of such targets. In 
fact, with the increasing use of sensors that are able 
to record all sorts of aspects of system performance, 
environment and use, the opportunities for statisti- 
cal modeling of these aspects are greater than ever. 

Given that systems engineering stresses the im- 
portance of making trade-off decisions, we remarked 
that the reliability information required to support 
such decisions is — expressed abstractly — the depen- 
dence of the reliability metric 

r = r(d,p, u, m, c) 

in terms of d,p, it, m and c, the choices made for de- 
sign, production, usage, maintenance and changes. 
While defining such a function precisely would be 
impractical, we feel that this at least provides a con- 
ceptual model for the direction statisticians should 
be taking. Reliability optimization models, reliabil- 
ity growth models and other such models are all 



techniques used to provide partial approximations 
to this function. 

The fact that some decisions are made later in 
the design process means that models are sometimes 
used in ways that are uncomfortable to statisticians 
and mathematicians: For example, the use of a con- 
stant failure rate lifetime model (exponential dis- 
tribution) for computations early in the process and 
the later use of increasing failure rate models for the 
same system to help determine maintenance inter- 
vals may seem contradictory, but if the first model 
was applied with the knowledge that the mainte- 
nance intervals would be fixed post hoc to ensure 
that the failure rate is roughly constant, then there 
is no problem. This is a small illustration of the way 
the flexibility endowed by future decisions can en- 
sure appropriateness of modeling tools post hoc. 

Many of the elicitation techniques applied within 
engineering design have crossed over from the prob- 
abilistic risk analysis area. Despite the many sim- 
ilarities with this area, there are a few key differ- 
ences. One is that many uncertainties will be af- 
fected in some way by future design decisions, so an 
understanding of dependence on design parameters 
is critical. Another critical aspect is that the design 
process is one of learning for the engineers. Hence 
the designers' insights change throughout the pro- 
cess and there is thus a need for problem and model 
structuring techniques to be applied: the qualita- 
tive structuring of statistical models has to be tied 
closely to the design development process. 

Two reliability modeling frameworks have been 
described that try to extend the scope of reliability 
techniques from "small world" problems to provide 
guidance over a wider range of problems. We noted 
also, however, that a holistic "whole life modeling" 
approach would need to be attractive to the differ- 
ent stakeholders associated to the system and that 
there is a need to provide a rational consensus across 
these parties about the uncertainties faced. Tech- 
niques developed in the PRA setting may be adapt- 
able to this situation, and integration with usual 
systems engineering approaches appears to be nat- 
ural. 

This brings us to some observations on the foun- 
dational aspects of reliability modeling in this area, 
because on the one hand, the methods are largely 
subjective, but on the other hand, there is a recog- 
nition of the limitations of Bayesian techniques. One 
such limitation is the need to establish rational con- 
sensus (as noted above) to break down the some- 
times adversarial relationship encountered between 
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stakeholders. Another is that the nature of learning 
in engineering design is that new modeling needs are 
continually emerging and model structures — with 
corresponding likelihood functions — need to be ad- 
justed to match. 

Engineering design is an area of great interest for 
statisticians, but involvement in this area requires 
some changes in mindset. Reliability is one of the 
many requirements that the designer is trying to 
juggle. Hence supporting the design process has to 
be done by giving insights into what is feasible. All 
decisions can be modulated later in the process as 
long as there is a feasible set of solutions: failure of 
the design process occurs when decisions taken ear- 
lier imply that the set of feasible solutions is empty. 
Historically, many reliability techniques have been 
applied too little or too late in the design process to 
inform it properly and some practitioners, such as 
O'Connor [112], have been critical of statistical re- 
liability work, seeing it as a numbers game, but the 
increased use of expert judgement combined with 
more rapid information distribution through infor- 
mation technology systems gives real opportunities 
to "raise the game" as far as the impact of reliabil- 
ity is concerned. We have identified a whole set of 
problems in which there is scope for statisticians and 
operations researchers to play a role in developing 
new elicitation methods and modeling tools within 
the systems engineering design process. Surely, as we 
get more deeply involved in such areas, the insights 
into uncertainty elicitation will provide benefits for 
other application areas too. Although we would not 
go as far as Lindley's colleague in claiming "there are 
no problems left in statistics except the assessment 
of probability" [95], it is undeniably the case that 
expert judgement methods dramatically increase the 
scope for statistical work in engineering design prob- 
lems and that work in that area provides us with a 
new range of contexts within which new elicitation 
methods can be developed. 
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